Clean Speech

Noisy Speech

Four noise types artificially added.

  • car

  • engine

  • street noise

  • background talkers

Speech Generation

Generating speech with EPG signals only


Generating speech with EPG signals and noisy speech signals in early fusion mode


Generating speech with EPG signals and noisy speech signals in late fusion mode


Speech Enhancement

Using audio signal only

  • car

  • engine

  • street noise

  • background talkers

Combining EPG and audio signal in early fusion

  • car

  • engine

  • street noise

  • background talkers

Combining EPG and audio signal in late fusion

  • car

  • engine

  • street noise

  • background talkers

Appendix

Appendix I. Comparison of different speech enhancement approach (p-value) | | Noisy - Baseline | Baseline - EF | Baseline - LF | EF - LF | |——-|—————-|————-|————-|——-| | PESQ | 0.000 | 0.000 | 0.007 | 0.189 | | STOI | 0.000 | 0.000 | 0.000 | 0.842 | | ESTOI | 0.000 | 0.000 | 0.000 | 0.585 | | MCD | 0.000 | 0.000 | 0.000 | 0.000 | | SSNR | 0.000 | 0.061 | 0.000 | 0.073 |


Appendix II. Comparison of noisy speech and baseleine in speech enhancement at SNR levels (p-values) | |PESQ | STOI|ESTOI| MCD| SSNR| |—–|—–|—–|—–|—–|—–| | 10dB|0.000|0.000|0.000|0.000|0.000| | 5dB|0.000|0.000|0.000|0.000|0.000| | 0dB|0.000|0.000|0.000|0.000|0.000| | -5dB|0.000|0.000|0.000|0.000|0.000| |-10dB|0.000|0.000|0.000|0.000|0.000|


Appendix III. Comparison of EF and LF in speech enhancement at SNR levels (p-values) | |PESQ | STOI|ESTOI| MCD| SSNR| |—–|—–|—–|—–|—–|—–| | 10dB|0.000|0.227|0.019|0.355|0.125| | 5dB|0.008|0.406|0.117|0.018|0.687| | 0dB|0.240|0.586|0.866|0.001|0.043| | -5dB|0.909|0.758|0.118|0.000|0.001| |-10dB|0.127|0.018|0.000|0.000|0.001|